This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.
Background Picture 1. Image of the Structure of the Protein for which NOD1 Codes
This Image shows the structure of the Nucletide-binding oligomerization domain-containing protein 1. The image is from Wikipedia.
The NOD1 gene codes for the protein Nucleotide-binding oligomerization domain-containing protein, and the gene is found on chromosome 7 ("Genecards" 2014). The protein is 953 amino acids long ("Genecards" 2014). This protein plays an important role in caspase-9 mediated cell death, or apoptosis ("Genecards" 2014). The NOD1 protein functions in the cytosol of a cell and recognizes petidoglycans, which are found on the cell wall of a bacterium (Strober et al. 2006 and Kim et al. 2004). Once a pathogen is recognized, NOD1 activates the cell immune response. One of the ways it does this is by signaling for the activation of NF-kappa-B, which controls the inflammatory response in a cell (Kim et al. 2004). NOD1 is found primarily in epithelial cells, and antigen-presenting cells (Strober et al. 2006). It has been indicated in diseases such as Inflammatory Bowel Disease, Ulcerative Colitis ("Genecards" 2014), asthma and Chron's Disease (Strober et al. 2006) when it functions improperly.
Methods
The sequence for the assigned protein was obtained from the Ensembl protein database in a FASTA format. Through a BLAST in the whaleshark.georgiaaquarium.org Galaxy server, the human protein sequence (the ID is ENSP00000222823) was queried against the whale shark protein database. The five top protein hits were then crosschecked as queries in BLASTS against the human protein database in NCBI to determine which were the closest matches. The alignment lengths, E-Values,Query Coverages, and Percent Identities were used to determine the closest matches. NOD1 predicted orthologs were obtained from Bottle-nose dolphins, elephant sharks, zebra fish, and dogs. The Human protein FASTA was used as the query when performing the BLASTS for the mentioned organisms in NCBI. The best hit for each organism and each whale shark protein BLAST were used to create a phylogenetic tree through ClustalW.
Searching the Whale Shark Genome
Table 1. Potential Whale Shark Proteins that Match the Human NOD1 gene This table represents the five best protein matches that were found when the the Human FASTA sequence was the query in the Galaxy database, and how each of the potential Whale Shark proteins were cross checked with the Human database on the NCBI BLAST site. Listed is the top protein hit that was returned when each potential Whale Shark Sequence was searched as a query sequence. Note that the Nucleotide-binding oligomerization domain-containing protein 1 was not ever the best matching protein. The units for "Protein Length" and "Alignment Length" is amino acids.
Table 2. Potential Whale Shark Proteins and how closely the Match the Human NOD1 Sequence This table displays how well each potential protein matched the human Nucleotide-binding oligomerization domain-containing protein 1. Since it was never the top returned protein match in humans, a separate table was made to display how well each matched the potential whale shark proteins.
The NOD1 gene was searched for in the Whale Shark genome. The Human FASTA for Nucleotide-binding oligomerization domain-containing protein 1 was obtained on the Ensembl site, and the five best matches were selected based on their low e-values and the length of their alignment with the Human query. Each of the five potential proteins was cross checked on the NCBI BLAST. Each Whale Shark FASTA was used as a query search in the Human genome. The cross checked results are in Table 1. Each of the five potential proteins returned a human protein that was not Nucleotide-binding oligomerization domain-containing protein 1. However, four out of the five potential proteins did return Nucleotide-binding oligomerization domain-containing protein 1 as a potential result; it just was not listed as the best match. These results are summarized in Table 2. The protein g37533.t1 has its categories listed as "N/A" because that was the protein that did not return Nucleotide-binding oligomerization domain-containing protein 1. However, it did return two different isoforms of the protein, proving that it is a closely related sequence. Thus, a table was made of how well each of the potential proteins matched back to the NOD1 gene. It makes sense that each of the proteins matched to an extent because they are all in similar families, such as ABC_ATPase. It is believed that the gene g46108 is the closest related protein to the NOD1 gene. Even though it does not have the smallest E-value, as shown in table 2 above, it has an 85% alignment, and 26% of that alignment is exact amino acid matches. There is a chance that an orthologue does not exist in the Whale Sharks' genome because the cross-check searches returned different proteins than Nucleotide binding oligomerization domain-containing protein 1. However, this was listed as a potential protein for four out of the five potential sequences and all the potential genes are from the similar superfamilies, which are explained in greater detail in the next section.
Protein Domains Picture 2. Protein Domains in the Nucleotide binding oligomerization domain-containing protein 1 This picture is from the NCBI BLAST site and shows the various Domains and Superfamilies that the Nucleotide binding oligomerization domain-containing protein 1 contains.
CARD Domain The CARD Domain present in the protein is one of the domains in the Death Domains superfamily. CARD sequences are found in animals as well as “plants, fungi and prokaryotes” (Lahm et al. 2003). CARD protein sequences play a role in protein recognition that leads to apoptosis (Lahm et al. 2003). NOD1 plays a role in Caspace-9 recruitment (“Genecards” 2014), which is an “apoptotic initiator” (Shi et al. 2014).
NACHT Family The NACHT abbreviation comes from “NAIP (neuronal apoptosis inhibitory protein), CIITA (MHC class II transcription activator), HET-E (incompatibility locus protein from Podospora anserina) and TP1 (telomerase-associated protein)” (Damiano et al. 2004). NACHT proteins are important in regulating inflammation, and when the NACHT sequence is mutated diseases such as Crohn’s disease are found (Damiano et al. 2004). This corroborates the research from the background, which states that mutations in the NOD1 gene can adversely affect an organism’s bowels with disorders such as Crohn’s Disease (Strober et al. 2006).
LRR_RI Superfamily
LRR stands for Leucine-rich repeats. The LRRs typicaly fold into a horseshoe shape and provide structural support so that the protein can interact with other proteins (Enkhbayar et al. 2004). The repeat is 20-30 amino acids long, but can be repeated up to 42 times (Enkhbayar et al. 2004). LRRs are found in “viruses to eukaryotes” (Enkhbayar et al. 2004).
Orthologues Picture 3. Elephant Shark BLAST Results This is a picture of the search results obtained by performing a BLAST search of the human NOD1 sequence in the elephant shark database of the NCBI BLAST site. The best protein match is shown by the red line and greater than 200 amino acids match between the two sequences. The protein also belongs to the Death Domain Superfamily.
Since the Human FASTA sequence for NOD1 did not return any matches with the whale shark sequence, the sequence was compared to Elephant Sharks next because both species are known to be closely related to one another. In fact, the NOD1 gene did have an orthologue in the Elephant Shark sequence. A BLAST with the NOD1 FASTA sequence returned a close protein match with an extremely low E-value of 0.0. The Elephant Shark protein also had the same name and was part of the Death Domain Superfamily. These three pieces of evidence led to the conclusion that the NOD1 gene has an ortholog with the Elephant Shark.
Phylogeny Picture 4. Phylogenetic Tree of Human Protein, Potential Whale Shark Proteins, and Other Species This phylogenetic tree displays the relationship between the Whale Shark proteins, the Zebrafish, the Elephant Shark, Humans, Dogs, and the Bottle-nosed Dolphin. Notice that there is a divergence from the Whale Shark proteins and the other species. The other species do not show much resemblance with the Whale Shark, which suggests that this protein is not an exact match in the Whale Shark, but has changed. It is more closely related to other species such as dogs, bottle-nosed dolphins and zebrafish. There is one Whale Shark potential protein which could not be placed into the phylogenetic tree due to technical issues. This protein was the g36672.
Conclusions The role of the NOD1 gene as an immunological response is undoubtedly an irreplaceable trait in humans. Since no clear homologs were found for the gene in the whale shark species, it can be inferred that a great portion of the protein sequnce was significantly modified or lost over time. There seems to be a very distant common ancestor (as seen from the phylogenetic tree) for Whale Sharks and Humans.
It was decided that orthologs for the NOD1 gene would be found in other species since the whale shark and human genes were so distant. As a "plan B" the FASTA sequence for the human NOD1 gene was compared to that of the elephant shark. High resemblence between the Human and Elephant Shark proteins suggests that the Elephant Shark protein may be a key to discovering how the gene deviated from the common ancestor. This will be helpful since the similarities between Human and Whale Shark proteins are not clearly evident. This might also give an insight as to how exactly the variation of the gene affects its function in both species.
References Damiano, Jason S., Oliveira, Vasco, Welsh, Kate, Reed, John C.. “Heterotypic interactions among NACHT domains: implications for regulation of innate immune responses”. Biochem. J. 381 (2004): 213–219. Web. 13 April 2015. < http://www.biochemj.org/bj/381/bj3810213.htm>.
Enkhbayar, Purevjav, Kamiya, Masakatsu, Osaki, Mitsuru, Matsumoto, Takeshi, Matsushima, Norio. “Structural principles of leucine-rich repeat (LRR) proteins”. Proteins 54(3) (15 February 2004): 394-403. Web. 13 April 2015. < http://www.ncbi.nlm.nih.gov/pubmed/14747988>.
Kim, Jae Gyu, Sung Joong Lee, and Martin F. Kagnoff. “Nod1 Is an Essential Signal Transducer in Intestinal Epithelial Cells Infected with Bacteria That Avoid Recognition by Toll-Like Receptors.” Infection and Immunity 72.3 (2004): 1487–1495. PMC. Web. 10 Apr. 2015. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC356064/>
Lahm, Armin, Paradisi, Andrea, Green, Douglas R., Melino, Gerry. “Death Fold Domain Interactions in Apoptosis”. Cell Death and Differentiation 10 (2003): 10–12. Web. 13 April 2015. <http://www.nature.com/cdd/journal/v10/n1/full/4401203a.html>.
This Project
This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein.Background
Picture 1. Image of the Structure of the Protein for which NOD1 Codes
This Image shows the structure of the Nucletide-binding oligomerization domain-containing protein 1. The image is from Wikipedia.
The NOD1 gene codes for the protein Nucleotide-binding oligomerization domain-containing protein, and the gene is found on chromosome 7 ("Genecards" 2014). The protein is 953 amino acids long ("Genecards" 2014). This protein plays an important role in caspase-9 mediated cell death, or apoptosis ("Genecards" 2014). The NOD1 protein functions in the cytosol of a cell and recognizes petidoglycans, which are found on the cell wall of a bacterium (Strober et al. 2006 and Kim et al. 2004). Once a pathogen is recognized, NOD1 activates the cell immune response. One of the ways it does this is by signaling for the activation of NF-kappa-B, which controls the inflammatory response in a cell (Kim et al. 2004). NOD1 is found primarily in epithelial cells, and antigen-presenting cells (Strober et al. 2006). It has been indicated in diseases such as Inflammatory Bowel Disease, Ulcerative Colitis ("Genecards" 2014), asthma and Chron's Disease (Strober et al. 2006) when it functions improperly.
Methods
The sequence for the assigned protein was obtained from the Ensembl protein database in a FASTA format. Through a BLAST in the whaleshark.georgiaaquarium.org Galaxy server, the human protein sequence (the ID is ENSP00000222823) was queried against the whale shark protein database. The five top protein hits were then crosschecked as queries in BLASTS against the human protein database in NCBI to determine which were the closest matches. The alignment lengths, E-Values,Query Coverages, and Percent Identities were used to determine the closest matches. NOD1 predicted orthologs were obtained from Bottle-nose dolphins, elephant sharks, zebra fish, and dogs. The Human protein FASTA was used as the query when performing the BLASTS for the mentioned organisms in NCBI. The best hit for each organism and each whale shark protein BLAST were used to create a phylogenetic tree through ClustalW.
Searching the Whale Shark Genome
Table 1. Potential Whale Shark Proteins that Match the Human NOD1 gene
This table represents the five best protein matches that were found when the the Human FASTA sequence was the query in the Galaxy database, and how each of the potential Whale Shark proteins were cross checked with the Human database on the NCBI BLAST site. Listed is the top protein hit that was returned when each potential Whale Shark Sequence was searched as a query sequence. Note that the Nucleotide-binding oligomerization domain-containing protein 1 was not ever the best matching protein. The units for "Protein Length" and "Alignment Length" is amino acids.
Table 2. Potential Whale Shark Proteins and how closely the Match the Human NOD1 Sequence
This table displays how well each potential protein matched the human Nucleotide-binding oligomerization domain-containing protein 1. Since it was never the top returned protein match in humans, a separate table was made to display how well each matched the potential whale shark proteins.
The NOD1 gene was searched for in the Whale Shark genome. The Human FASTA for Nucleotide-binding oligomerization domain-containing protein 1 was obtained on the Ensembl site, and the five best matches were selected based on their low e-values and the length of their alignment with the Human query. Each of the five potential proteins was cross checked on the NCBI BLAST. Each Whale Shark FASTA was used as a query search in the Human genome. The cross checked results are in Table 1. Each of the five potential proteins returned a human protein that was not Nucleotide-binding oligomerization domain-containing protein 1. However, four out of the five potential proteins did return Nucleotide-binding oligomerization domain-containing protein 1 as a potential result; it just was not listed as the best match. These results are summarized in Table 2. The protein g37533.t1 has its categories listed as "N/A" because that was the protein that did not return Nucleotide-binding oligomerization domain-containing protein 1. However, it did return two different isoforms of the protein, proving that it is a closely related sequence. Thus, a table was made of how well each of the potential proteins matched back to the NOD1 gene. It makes sense that each of the proteins matched to an extent because they are all in similar families, such as ABC_ATPase. It is believed that the gene g46108 is the closest related protein to the NOD1 gene. Even though it does not have the smallest E-value, as shown in table 2 above, it has an 85% alignment, and 26% of that alignment is exact amino acid matches. There is a chance that an orthologue does not exist in the Whale Sharks' genome because the cross-check searches returned different proteins than Nucleotide binding oligomerization domain-containing protein 1. However, this was listed as a potential protein for four out of the five potential sequences and all the potential genes are from the similar superfamilies, which are explained in greater detail in the next section.
Protein Domains
Picture 2. Protein Domains in the Nucleotide binding oligomerization domain-containing protein 1
This picture is from the NCBI BLAST site and shows the various Domains and Superfamilies that the Nucleotide binding oligomerization domain-containing protein 1 contains.
CARD Domain
The CARD Domain present in the protein is one of the domains in the Death Domains superfamily. CARD sequences are found in animals as well as “plants, fungi and prokaryotes” (Lahm et al. 2003). CARD protein sequences play a role in protein recognition that leads to apoptosis (Lahm et al. 2003). NOD1 plays a role in Caspace-9 recruitment (“Genecards” 2014), which is an “apoptotic initiator” (Shi et al. 2014).
NACHT Family
The NACHT abbreviation comes from “NAIP (neuronal apoptosis inhibitory protein), CIITA (MHC class II transcription activator), HET-E (incompatibility locus protein from Podospora anserina) and TP1 (telomerase-associated protein)” (Damiano et al. 2004). NACHT proteins are important in regulating inflammation, and when the NACHT sequence is mutated diseases such as Crohn’s disease are found (Damiano et al. 2004). This corroborates the research from the background, which states that mutations in the NOD1 gene can adversely affect an organism’s bowels with disorders such as Crohn’s Disease (Strober et al. 2006).
LRR_RI Superfamily
LRR stands for Leucine-rich repeats. The LRRs typicaly fold into a horseshoe shape and provide structural support so that the protein can interact with other proteins (Enkhbayar et al. 2004). The repeat is 20-30 amino acids long, but can be repeated up to 42 times (Enkhbayar et al. 2004). LRRs are found in “viruses to eukaryotes” (Enkhbayar et al. 2004).
Orthologues
Picture 3. Elephant Shark BLAST Results
This is a picture of the search results obtained by performing a BLAST search of the human NOD1 sequence in the elephant shark database of the NCBI BLAST site. The best protein match is shown by the red line and greater than 200 amino acids match between the two sequences. The protein also belongs to the Death Domain Superfamily.
Since the Human FASTA sequence for NOD1 did not return any matches with the whale shark sequence, the sequence was compared to Elephant Sharks next because both species are known to be closely related to one another. In fact, the NOD1 gene did have an orthologue in the Elephant Shark sequence. A BLAST with the NOD1 FASTA sequence returned a close protein match with an extremely low E-value of 0.0. The Elephant Shark protein also had the same name and was part of the Death Domain Superfamily. These three pieces of evidence led to the conclusion that the NOD1 gene has an ortholog with the Elephant Shark.
Phylogeny
Picture 4. Phylogenetic Tree of Human Protein, Potential Whale Shark Proteins, and Other Species
This phylogenetic tree displays the relationship between the Whale Shark proteins, the Zebrafish, the Elephant Shark, Humans, Dogs, and the Bottle-nosed Dolphin. Notice that there is a divergence from the Whale Shark proteins and the other species. The other species do not show much resemblance with the Whale Shark, which suggests that this protein is not an exact match in the Whale Shark, but has changed. It is more closely related to other species such as dogs, bottle-nosed dolphins and zebrafish. There is one Whale Shark potential protein which could not be placed into the phylogenetic tree due to technical issues. This protein was the g36672.
Conclusions
The role of the NOD1 gene as an immunological response is undoubtedly an irreplaceable trait in humans. Since no clear homologs were found for the gene in the whale shark species, it can be inferred that a great portion of the protein sequnce was significantly modified or lost over time. There seems to be a very distant common ancestor (as seen from the phylogenetic tree) for Whale Sharks and Humans.
It was decided that orthologs for the NOD1 gene would be found in other species since the whale shark and human genes were so distant. As a "plan B" the FASTA sequence for the human NOD1 gene was compared to that of the elephant shark. High resemblence between the Human and Elephant Shark proteins suggests that the Elephant Shark protein may be a key to discovering how the gene deviated from the common ancestor. This will be helpful since the similarities between Human and Whale Shark proteins are not clearly evident. This might also give an insight as to how exactly the variation of the gene affects its function in both species.
References
Damiano, Jason S., Oliveira, Vasco, Welsh, Kate, Reed, John C.. “Heterotypic interactions among NACHT domains: implications for regulation of innate immune responses”. Biochem. J. 381 (2004): 213–219. Web. 13 April 2015. < http://www.biochemj.org/bj/381/bj3810213.htm>.
Enkhbayar, Purevjav, Kamiya, Masakatsu, Osaki, Mitsuru, Matsumoto, Takeshi, Matsushima, Norio. “Structural principles of leucine-rich repeat (LRR) proteins”. Proteins 54(3) (15 February 2004): 394-403. Web. 13 April 2015. < http://www.ncbi.nlm.nih.gov/pubmed/14747988>.
Kim, Jae Gyu, Sung Joong Lee, and Martin F. Kagnoff. “Nod1 Is an Essential Signal Transducer in Intestinal Epithelial Cells Infected with Bacteria That Avoid Recognition by Toll-Like Receptors.” Infection and Immunity 72.3 (2004): 1487–1495. PMC. Web. 10 Apr. 2015. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC356064/>
Lahm, Armin, Paradisi, Andrea, Green, Douglas R., Melino, Gerry. “Death Fold Domain Interactions in Apoptosis”. Cell Death and Differentiation 10 (2003): 10–12. Web. 13 April 2015. <http://www.nature.com/cdd/journal/v10/n1/full/4401203a.html>.
“Protein NOD1 PDB 2b1w”. Wikipedia. Wikipedia, 15 December 2009. Image. 13 April 2015. < http://en.wikipedia.org/wiki/NOD1>.
“Nucleotide-Binding Oligomerization Domain Containing 1”. GeneCards. Gene Cards Suite. May 7 2014. Web. 31 March 2015. <http://www.genecards.org/cgi-bin/carddisp.pl?gene=NOD1>
Strober, Warren, Murray Peter J., Kitani, Atsushi, Watanabe, Tomohiro. “Signaling pathways and molecular interactions of NOD1 and NOD2.” Nature Reviews Immunology 6 (January 2006): 9-20. Web. 10 April 2015. < http://www.nature.com/nri/journal/v6/n1/full/nri1747.html>.
Shi, Jianjin, Zhao, Yue, Wang, Yupeng, Gao, Wenqing, Ding, Jingjin, Li, Peng, Hu Liyan, Shao, Feng. “Inflammatory Caspases are Innate Immune Receptors for Intracellular LPS”. Nature 514 (06 August 2014): 187-192. Web. 13 April 2015. <http://www.nature.com/nature/journal/v514/n7521/full/nature13683.html#discussion>.